New & upvoted

Customize feedCustomize feed

Quick takes

Show community
View more
Set topic
Frontpage
Global health
Animal welfare
Existential risk
Biosecurity & pandemics
12 more
14
Linch
19h
0
I'm starting an internal red-teaming project at Forethought against Forethought's AI propensity targeting/model character work. The theory of change for internal red-teaming is that someone (me!) spending dedicated focused time on the negative case can find and elaborate on holes that Forethought's existing feedback processes will miss. Possible good outcomes from the red-teaming project (roughly in decreasing order of expected impact): 1. I identify and come up with good reasons to wind down Forethought's work in this field iff it is correct to do so. 1. Note that "identify" in this case may involve original research, but it may instead be mostly about sourcing the ideas from other people. 2. I identify ways the current direction of propensity targeting work is bad and suggest significant directional changes. 3. People don't change high-level direction, but I understand and can make the counterarguments to this work legible enough within Forethought, so ppl are aware of the counterarguments and make less mistakes in their research. 4. I change my own mind significantly on the case for this work and decide to double down on it myself. 5. I identify sufficiently concrete cruxes that we can use to decide whether to prioritize this work in the future (eg stop/wind down this work if XYZ happens) I think I'd find it very helpful to understand why people are skeptical of AI propensity targeting work either in general or by Forethought in particular. For me, the most useful critiques will elaborate on why the work is net-negative (ideally significantly) or why it's almost certain to be useless. Though other critiques are welcome as well. I'd really appreciate comments from people here! Especially helpful is links to longer write-ups/existing comments. I might also want to chat on a call with people if anybody's interested. I've already read comments here, here, and various critical writings on Anthropic's Constitution and "soul doc" (which is related though
The Survival and Flourishing Fund, a virtual fund backed by philanthropist Jaan Tallinn, is organizing the distribution of approximately $14-28MM in grants via the S-Process Main the Theme Rounds through this Fall with applications from orgs due throughout this summer. Applications are essential to enabling the grant recommendation team to learn about and debate the pros and cons of each organization under consideration, both old and new. So, please encourage applications from any awesome charitable projects you know about that are trying to support humanity’s long-term survival and flourishing! Check out our complete announcement to learn more about the deadlines, updates, and details about the round: https://survivalandflourishing.fund/2026/application
Recently, I've been mulling over the question of whether it was a good idea or not to join a frontier AI company's safety team for the purposes of reducing extinction risk. One of my big cons was something like: Jay, you think the incentives are less likely to affect you compared to most people. But most AI safety people who join frontier labs probably think this. You will be affected as well. So I decided on a partial mitigation strategy, entirely as a precautionary principle and not at all because I thought I needed to. I committed to myself and to several people I'm close to that if I were to join a frontier lab safety team, I would donate 100% of the surplus that I would gain as a result of taking that job instead of a less lucrative job somewhere else. At this time I was applying for a few jobs, one of which was at a frontier company. Approximately immediately, my System 1 became way less interested in that job. And I didn't even have an offer in hand for a specific amount of money. I don't have good reasons to care a lot about getting more money for myself. I have enough already, and I voluntarily live well below my means. This did not stop the effect from existing, and I didn't notice the effect before. I still don't notice the effect on my thinking in a vacuum. I only notice it by doing a mental side-by-side comparison. I now think anyone who is considering joining a frontier company in order to reduce extinction risk should make this same commitment as a basic defensive measure against perverted incentives. I am sure there exist people who are entirely indifferent to money in this way - this is at least partially a skill issue on my part. But it does seem that "Thinking you are indifferent to the money" is not a reliable signal that your thinking is unaltered by it. This is also an opportunity to say that, if I ever do join a frontier safety team, I officially give you permission to ask me if I'm meeting this commitment of mine in conversation, even if
Heads up: We’ve got new job-board email alerts that you can use to automatically get notified about impactful new roles as soon as they’re published. As you may have noticed, we released an exciting new version of our job board last week. One thing we didn’t cover much in our main announcement is how much we’ve improved our job alerts. These alerts are probably the most time-efficient way to find relevant roles, so it was a priority for us to make them as good as possible. We’ve made a bunch of improvements, but the main theme is that they offer much more customizability and look much better: If you’ve been meaning to set up an email alert for the board, treat this as your nudge. It only takes a few seconds: just select your filters on the board, then hit the green Get job alerts button and enter your email. As always, please let us know if you have any suggestions or encounter any bugs! 
19
Lizka
4d
1
I've found the following abstract frame/set of heuristics useful for thinking about how we can try to affect (or predict) the long-term future:  “How do we want to spend our precision/reach points? And can we spend them more wisely?” [Meta: This is a rough, abstract, and pretty rambly note with assorted links; I’m just trying to pull some stuff out and synthesize it in a way I can more easily reference later (hoping to train habits along these lines). I don't think the ideas here are novel, and honestly I'm not sure who'd find this useful/interesting. (I might also keep editing it as I go.] ---------------------------------------- An underlying POV here is that (a) scope and (b) precision are in tension. (Alts: (a) "ambition / breadth / reach / ...” — vs —  (b) “predictability / fidelity / robustness / ...”). You can aim at something specific and nearby [high precision, limited reach] or at something larger and farther away, fuzzier [low precision, broad reach]. And if you care about the kind of effect you’re having (you want to make X happen, not just looking for influence ~for influence’s sake), this matters a bunch. Importantly, I think there are “architectural” features of the world/reality[1] that can ease this tension somewhat if they're used properly; if you channel your effort through them, you can transmit an intervention without it dissipating (or getting warped) as much as it otherwise would. Any channels like this will still be leaky (and they’re limited), but this sort of “structure” seems like the main thing to look for if you’re hoping to think about or improve the long-term future.  (See a related sketch diagram here. I also often picture something like: “what levers could reach across a paradigm shift?” (or: what features are invariant in relevant ways?)) ---------------------------------------- Some examples / thinking this through a bit: 1. Trying to organize or steer a social movement (/big group of people) might extend your reach, but